code algorithm
TSIS: A Supplementary Algorithm to t-SMILES for Fragment-based Molecular Representation
Wu, Juan-Ni, Wang, Tong, Tang, Li-Juan, Wu, Hai-Long, Yu, Ru-Qin
String-based molecular representations, such as SMILES, are a de facto standard for linearly representing molecular information. However, the must be paired symbols and the parsing algorithm result in long grammatical dependencies, making it difficult for even state-of-the-art deep learning models to accurately comprehend the syntax and semantics. Although DeepSMILES and SELFIES have addressed certain limitations, they still struggle with advanced grammar, which makes some strings difficult to read. This study introduces a supplementary algorithm, TSIS (TSID Simplified), to t-SMILES family. Comparative experiments between TSIS and another fragmentbased linear solution, SAFE, indicate that SAFE presents challenges in managing long-term dependencies in grammar. TSIS continues to use the tree defined in t-SMILES as its foundational data structure and encoding logic, which sets it apart from the SAFE model. The performance of TSIS models surpasses that of SAFE models, indicating that the algorithm of the t-SMILES family provides certain advantages.
Code Algorithms
Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. As long as coding and programming are used, algorithms will be at the heart of these technologies, defining what they do and how they do it.